SVM

Before moving forward with the to-do list, let’s throw a Random Forest to it.

SVM

For many reasons, Random Forest is usually a very good baseline model. In this particular case I started with the polynomial OLS as baseline model, just because it was so evident from the correlations that the relationship between temperature and consumption follows a polynomial shape. But let’s go back to a beloved RF.

/home/runner/work/strom/strom/.venv/lib/python3.12/site-packages/sklearn/svm/_classes.py:31: FutureWarning:

The default value of `dual` will change from `True` to `'auto'` in 1.5. Set the value of `dual` explicitly to suppress the warning.

/home/runner/work/strom/strom/.venv/lib/python3.12/site-packages/sklearn/svm/_base.py:1237: ConvergenceWarning:

Liblinear failed to converge, increase the number of iterations.

Model Cards provide a framework for transparent, responsible reporting. 
 Use the vetiver `.qmd` Quarto template as a place to start, 
 with vetiver.model_card()
Writing pin:
Name: 'wd-svm'
Version: 20250410T160943Z-1b835
♻️  stepit 'svm_raw': is up-to-date. Using cached result for `strom.modelling.assess_model()` 2025-04-10 16:09:43

Metrics

Single Split CV
train test test train
MAE - Mean Absolute Error 2.759698 2.489427 3.087172 3.141476
MSE - Mean Squared Error 17.777598 18.100418 15.989639 21.179925
RMSE - Root Mean Squared Error 4.216349 4.254459 3.781907 4.580536
R2 - Coefficient of Determination 0.815390 0.801454 -7.118546 0.784520
MAPE - Mean Absolute Percentage Error 0.311556 0.275527 0.660358 0.254285
EVS - Explained Variance Score 0.818426 0.803021 -1.855572 0.824080
MeAE - Median Absolute Error 2.164015 1.688725 2.542521 2.327501
D2 - D2 Absolute Error Score 0.612945 0.648707 -1.968504 0.553734
Pinball - Mean Pinball Loss 1.379849 1.244713 1.543586 1.570738

Scatter plot matrix

Observed vs. Predicted and Residuals vs. Predicted

Check for …

check the residuals to assess the goodness of fit.

  • white noise or is there a pattern?
  • heteroscedasticity?
  • non-linearity?

Normality of Residuals:

Check for …

  • Are residuals normally distributed?

Leverage

Scale-Location plot

Residuals Autocorrelation Plot

Residuals vs Time

Well, not that bad, but it is overfitting quite a lot.

♻️  stepit 'grid_search_pipe': is up-to-date. Using cached result for `strom.modelling.grid_search_pipe()` 2025-04-10 16:09:47
Model Cards provide a framework for transparent, responsible reporting. 
 Use the vetiver `.qmd` Quarto template as a place to start, 
 with vetiver.model_card()
Writing pin:
Name: 'wd-svm'
Version: 20250410T160947Z-46ac4
♻️  stepit 'svm_tuned': is up-to-date. Using cached result for `strom.modelling.assess_model()` 2025-04-10 16:09:47

Metrics

Single Split CV
train test test train
MAE - Mean Absolute Error 2.341431 2.247375 2.147532 2.581758
MSE - Mean Squared Error 15.653589 16.779396 7.877316 17.898406
RMSE - Root Mean Squared Error 3.956462 4.096266 2.711544 4.228775
R2 - Coefficient of Determination 0.837447 0.815945 -1.935593 0.817129
MAPE - Mean Absolute Percentage Error 0.184828 0.171403 0.474445 0.178073
EVS - Explained Variance Score 0.839705 0.841197 -1.089626 0.820615
MeAE - Median Absolute Error 1.497370 1.414702 1.790379 1.581437
D2 - D2 Absolute Error Score 0.671608 0.682863 -0.894147 0.631738
Pinball - Mean Pinball Loss 1.170716 1.123688 1.073766 1.290879

Scatter plot matrix

Observed vs. Predicted and Residuals vs. Predicted

Check for …

check the residuals to assess the goodness of fit.

  • white noise or is there a pattern?
  • heteroscedasticity?
  • non-linearity?

Normality of Residuals:

Check for …

  • Are residuals normally distributed?

Leverage

Scale-Location plot

Residuals Autocorrelation Plot

Residuals vs Time

TODOs